columns with unique values in them

Null value imputation for enrolled university
Null value imputation for last new job
null value imputation for company type
Null value imputation for experience
Null value imputation for company size

outliers in the data

UNIVARIATE ANALYSIS

CATEGORICAL COL

NUMERICAL COL

BIVARIATE ANALYSIS

NUM VS NUM

NUM VS CAT

CAT VS CAT

STATISTICS

Independance of attributes of categorical columns

H0: Cat columns are independant

H1: Cat columsn are dependant

H0: mu1 = mu2

H1: mu1 <> mu2

H0: mu=8

H1: mu<>8

From the above statistical analysi we can conclude the following: 1.We can conclude from the chi-squared test that all the categorical columns are independent of each other. 2.The mean training hours of employees who change jobs is different than those who don’t change jobs. 3.The Avg Experience of people who change jobs is not 8yrs as opposed to the real avg experience of employees.

FEATURE ENGINEERING

dummy encoding

Correaltion matrix after dummy encoding

there is no variable which shows a high correlation with job change in the data set also the correation amoung the variables is also not high

MACHINE LEARNING

Logistic Regression

impt features

Decision tree

Random Forest Classifier

AdaBoost

Gradiant Boost

XGboost

#col names need to be changes xg dont work on < signs in columns

Light GBM Classifier

KNN

Naive Bayes

1.Here the best Recall score is given by Naive Bayes and Light GBM and Gradient Boosting.

PLOT AUC ROC

Random Over Sampler

Final Models after over sampling

PLOT AUC ROC

Model Comparision

cross validataion

Grid search cv

Final model

cross val score and recall

LIME AND SHAP

Correctly Classified

Incorrectly Classified

Correctly Classified

Incorrectly Classified